Application Architecture
Resource
1. A developer's perspective
- Developers write code that is deployed to a server. For now, let's define a server as a computer that handles requests from another computer.
- This server also requires persistent storage to store the application's data. As such, a server may talk to an external storage system (database, cloud, etc.). This storage may not be part of the same server and is instead connected through a network.
2. A user’s perspective
A user is someone who makes a request to the server, usually through a web browser.
- If a user wanted to use a front-end feature, the server will respond with the JavaScript/HTML/CSS code, compiled to display what the user requested.
- A user can also make a request to a back-end server API, the server will respond with data, possibly in the JSON format, or other formats.
3. Scaling our server
1. Vertical vs. Horizontal Scaling
But, what if we have a lot of users and a single server cannot handle all the requests on its own?
Solution 1: We can determine the bottleneck, maybe the CPU is not fast enough, or maybe we don’t have enough RAM. Then upgrading the hardware of our single server may help it perform better. → vertical scaling (to take a single resource and make it better.)
Vertical scaling allows more requests to be handled by a single server.
Solution 2: Because we cannot scale a single server infinitely, we can use horizontal scaling to have multiple servers running our code. Users don’t have to talk to a single server but multiple → we can handle more requests at the same time → better performance. If one server were to go down, we can direct traffic to other servers → fault-tolerant.
Horizontal scaling allows more requests to be handled by multiple servers.
For simple applications, vertical scaling may be sufficient and easier. For large systems, we prefer horizontal scaling because it’s much more powerful, and can be achieved with relatively inexpensive, standard hardware.
However, it also requires much more engineering effort, as we need to ensure that the servers are communicating with each other, and the user requests are being distributed evenly.
- When we have multiple servers handling requests, we should have a load balancer (a device or software program that distributes incoming traffic evenly across a group of multiple servers) to forward the request to the server with a minimal amount of traffic.
It's also important to remember that servers don't exist in isolation. Servers are likely interacting with external servers, through APIs. For example, an e-commerce website server might use a payment gateway API to process customer credit card transactions (interacting with the payment processor’s server such as PayPal, Square, etc.).
2. Logging and Metrics
Servers also have logging services, which give the developer a log of all the activity that happened. Logs can be written to the same server, but for better reliability, they are commonly written to another external server.
In the case where our RAM has become the bottleneck, or our CPU resources are restricting the requests being handled efficiently, we require a metrics service to effectively identify the bottleneck. A metrics service will collect data from different sources within our server environment (CPU usage, network traffic) to let us gain more insight into the server’s behavior.
- If we log every time a user gets a failed response, we can use these logs to create metrics for how many requests are failing.
We can set up an alert for the metrics so that whenever a metric fails to meet a target, developers can receive a notification.
- For example, if 100% of the user requests receive successful responses, we could set an alert to be notified if this metric dips under 95%.